How to Deal with Large Dataset, Class Imbalance and Binary Output in SVM based Response Model

نویسندگان

  • Hyunjung Shin
  • Sungzoon Cho
چکیده

Support Vector Machine (SVM) employs Structural Risk Minimization (SRM) principle to generalize better than conventional machine learning methods employing the traditional Empirical Risk Minimization (ERM) principle. When applying SVM to response modeling in direct marketing, however, one has to deal with the practical difficulties: large training data, class imbalance and binary SVM output. This paper proposes ways to alleviate or solve the addressed difficulties through informative sampling, use of different costs for different classes, and use of distance to decision boundary. This paper also provides various evaluation measures for response models in terms of accuracies, lift chart analysis and computational efficiency.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Research on Credit Card Fraud Detection Model Based on Class Weighted Support Vector Machine

To deal with credit card fraud, a detection model based on Class Weighted Support Vector Machine was established. Due to large-scale and high dimensions of data, Principal Component Analysis (PCA) was adopted firstly to screen out the main factors from a great deal of indicative attributes in order to reduce the training dimension of SVM effectively. Then according to the characteristics of cre...

متن کامل

Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem

Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue...

متن کامل

شناسایی نوع و مدل وسیله نقلیه با استفاده از مجموعه بخش‌های متمایز‌کننده

In fine-grained recognition, the main category of object is well known and the goal is to determine the subcategory or fine-grained category. Vehicle make and model recognition (VMMR) is a fine-grained classification problem. It includes several challenges like the large number of classes, substantial inner-class and small inter-class distance. VMMR can be utilized when license plate numbers ca...

متن کامل

ADABOOST ENSEMBLE ALGORITHMS FOR BREAST CANCER CLASSIFICATION

With an advance in technologies, different tumor features have been collected for Breast Cancer (BC) diagnosis, processing of dealing with large data set suffers some challenges which include high storage capacity and time require for accessing and processing. The objective of this paper is to classify BC based on the extracted tumor features. To extract useful information and diagnose the tumo...

متن کامل

The Use of the Binary Bat Algorithm in Improving the Accuracy of Breast Cancer Diagnosis

Introduction: The early diagnosis of breast cancer as prevalent cancer among women, is a necessity in the research on cancers since it could simplify the clinical management of other patients. The importance of the classification of breast cancer patients into high- or low-risk groups has led research groups in the biomedical and informatics departments to evaluate and use computer techniques s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004